12.06.2019

Introduction

Results of our little survey:

Field Research
Bibliometrics Statistics, coding, prose
Game studies Qualitative, game logs
Mathematics Coding, pen and paper
Petroleum engineering Statistics, risk analysis
Energy and process engineering Quantitative and qualitative
Digital transformation Text data, modelling
Computer science Algorithm, sensor data
Learning analytics Sensor data
Humanities Video data
Cybernetics Simulation
Archeology
Digital infrastructure
Decision sciences Coding

Topics du jour:

  • The modern research cycle
  • The Open Science movement
  • Incorporating workflow thinking into your research

Part I: The modern research cycle

An idealised research project

As a researcher, in many ways, this is how you would want the ideal research project to look:

Requirements

…but then, you are not the one with the money (yet).

Many research funders now require the development of a heap of auxiliary information about the project:

  • Data management plan
  • Publication plan
  • Dissemination plan

Data management plan

Research data should be shared and reused more widely […] Better access to research data can boost innovation and value creation by enabling actors outside the research community to find new areas of application.

National strategy on access to and sharing of research data

  • What do you collect?
  • How do you treat it?
  • How will you keep/share it?

Publication plan

  • Where do you plan to publish?
  • What part of the project will make it into which publications?
  • How do the publications fit into the overall project?

The publishing cycle

The publishing cycle, really

Dissemination plan

  • How will you present your research?
  • In which channels?

Social media

A more realistic project plan

Is this you?

Why all this stuff?

[W]e have two major points to consider. First, due to a lack of adequate incentives in the reward structure of professional science […] actual replication attempts are rarely carried out. Second, to the extent that they are carried out, it can be well-nigh impossible to say conclusively what they mean, whether they are “successful” (i.e., showing similar, or apparently similar, results to the original experiment) or “unsuccessful” (i.e., showing different, or apparently different, results to the original experiment).

Earp, B. and D. Trafimov (2015) Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology

This is the whole abstract of an interesting paper in the field of genomic biology:

The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.

Ziemann, M., Y. Eren, A. El-Osta (2016) Gene name errors are widespread in the scientific literature. Genome Biology 17:177

Storytime

Here are some rows of some of the columns:

s4 s6 s7 s8 s9
4 4 1 NA 46
3 1 1 NA 125
3 1 1 NA 90
3 3 1 NA 156
4 5 1 NA 78
  • Only problem: I don’t know where I put the codebook!

An all-too familiar story

Part II: How to deal with this?

Just don’t do it

  • Research habits change very slowly
    • Being too far ahead of the pack can be more work than it’s worth
  • Funder mandates are actually minimal
  • There are few repercussions, if any

Try to integrate the issues into your workflow

Can I set up my workflow in a way that is

  • resource-efficient
  • modular
  • transparent and accountable

If I can, how much time and effort is it worth?

Part III: Examples of digital workflows

The general idea

Collaborating

What is a modern way to ensure that the work I do with others is always updated and always available for everyone that I collaborate with?

  • Sharepoint (requires institutional credentials)
  • Overleaf (becomes better with knowledge of the world’s worst markup language, LaTeX)
  • Jupyter notebooks / RMarkdown (become better with coding knowledge)

Keeping track

Version control software ensures integrity over time and context of text and other elements that you track with the software. Proper use guarantees against loss of work and goes a long wy towards ensuring transparency and accountability in the research process.

This is also a superior way of doing collaboration, as a repository can hold auxiliary files in addition to the text being edited, and all of it can be sent/updated in an integrated fashion.

Documenting

If you treat your current publication as a small project in its own right, documentation can be done concurrently with project development. This requires a small upfront investment, but saves a lot of work downstream and improves the quality of the work in the process.

Remember, documentation is not just about computer code or statistical data - any form of analysis that is actually processing information will require a thorough demonstration of this processing to show how you arrive at the conclusions you present in the text. This includes stuff like text analysis or qualitative interview data.

Sharing

Once you have everything done and would like to share your results, how do you do it?

Disseminating

The workflow thinking described above actually greatly eases the work connected to dissemination. Not only are the different parts of your research properly documented, linkable, shareable and citable, but using integrated development environments enables the production of many kinds of output from the same source.

The trade-offs

There are powerful, efficient tools at our disposal that can both mitigate against administrative burnout and improve the quality of our work and the workflow experience itself.

There is, however, a learning curve of varying steepness to integrate these things into an existing workflow. Is the time and effort worth it? Different researchers will have differing comfort levels regarding this.

Resources